Reproducible data collection

Aud Halbritter
#| label: setup
#| echo: false
#| eval:true
#| message: false

library(knitr)

Content

PFTC Courses

Design spreadsheet

Spreadsheet content

  • Date, time, observation number
  • Location: region/site
  • Experimental design: block, plot, replicate, number of observation, treatments
  • Organism: species/population/genet
  • Unique ID for sample/observation
  • Response
  • Predictors
  • METADATA: recorder/scribe, weather, notes

Design spreadsheet - manual

Design spreadsheet - digital

Design spreadsheet - data validation

Design spreadsheet - data validation

Spreadsheets

Discuss with your neighbour: are these good spreadsheets?

Spreadsheet - bad example

Figure 1: Bad example of spreadsheet.

Long or wide format

Figure 2: Long vs wide format.

Tidy data

Figure 3: Wide (A) and long (B) data table.

Consistency

Consistency - meaningful names

Figure 4: Final doc by PhDcomics.com

Consistency - meaningful names

Which of these are meaningful names?

  • T

  • bird_raw

  • jja1b

  • mean

  • data

  • ddd

Consistency - meaningful names

  • A name can contain letters, numbers, dot and underscore

  • First letter must be letter or dot. If the first character in a name is a dot, the object is invisible.

  • Avoid special characters (e.g. æ, å, ø, ö)

  • Avoid reserved names: function, TRUE, mean, etc.

Consistency - style

Figure 5: Different styles for naming objects. Credit: Allison Horst.

Consistency - standards

Use global data standards when available.

Figure 6: ?(caption)